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ABSTRACT 

Here, we address the problem of Independent Subspace 
Analysis (ISA). We develop a technique that (i) builds 
upon joint decorrelation for a set of functions, (ii) can be 
related to kernel based techniques, (iii) can be interpreted 
as a self-adjusting, self-grouping neural network solution, 
(iv) can be used both for real and for complex problems, 
and (v) can be a first step towards large scale problems. 
Our numerical examples extend to a few 100 dimensional 
ISA tasks. 
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2.1 The K-ISA Equations 

We treat real and complex ISA tasks: Let K e {R, C}. 
Assume that we observe the mixture of multidimensional 
independent i.i.d. sampled sources (components): 



z(t)=As(t), s(t) = [s 1 (t);...;s M (t)} 



(1) 



i=l dn 
DxD 



is the total dimension of the corn- 



where D = J2 r 

ponents, A 6 K nxJJ is the invertible mixing matrix. The 
task is to recover the hidden components s rn (t) s K dm by 
means of observations z(t) £ K D . If K = R (K = C) 
then we shall talk about Real (Complex) ISA [i.e., R-ISA 
(C-IS A)] task. The special d m = 1 ( Vm) case is the Real 
(Complex) Independent Component Analysis [i.e., R-ICA 
(C-ICA)]. 
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1 INTRODUCTION 

Uncovering independent processes is of high importance, 
because it breaks combinatorial explosion [10]. In cases, 
like Smart Dust, the problem is vital, because (i) ele- 
ments have limited computational capacity and (ii) com- 
munication to remote distances is prohibitively expen- 
sive. Self-adjusting, self-grouping neural network solu- 
tions may come to our rescue here. Here, we present such 
an approach for Independent Subspace Analysis (ISA). 
The extension of ISA to Independent Process Analysis is 
straightforward under certain conditions 1 10]. 

Our paper is built as follows. The K-ISA model is 
introduced in Section [2] Section [3] is about our method. 
Illustrations are provided in Section^] 



2 THE K-ISA MODEL 

Section 12.11 defines the K-ISA task to be studied, 
tion l2.2l treats the ambiguities of the model. 



Sec- 



2.2 Ambiguities of the K-ISA Model 

Identification of the K-ISA model is ambiguous. How- 
ever, ambiguities are simple: hidden components s m can 
be determined up to permutation among subspaces and 
up to invertible transformation within subspaces. Details 
about R-ISA and C-ISA can be found in 111 and P . 
respectively. 

Ambiguities within subspaces can be lessened: given 
our assumption on the invertibility of matrix A, we can 
assume without any loss of generality that both the sources 
and the observation are white, that is, 



E[s] 
EM 



0, cov [s] 
0, cov [z] 



Id, 
Id, 



(2) 
(3) 



where E[-] denotes the expectation value, Id is the 
Z?-dimensional identity matrix. Now, the s m sources are 
determined up to (i) permutation and orthogonal transfor- 
mation in the real case and (ii) permutation and unitary 
transformation in the complex case. 
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3 K-ISA BY JOINT DECORRELATION 

Components s m are estimated by a neural network, which 
aims to 'decorrelate' (see below) the y m € K dm parts 
of the K D 9 y(t) = [y 1 ^); • ■ ■;y M (t)} output of the 
network. The network executes mapping z i— > L(z, 8) 
with network parameter 0. 



3.1 Neural Network Candidates (L) 

Choosing an RNN with feedforward (F) and recurrent (R) 
connections then the network assumes the form 

f(r) = -y(r) + Fz(t) - Ry(r) (4) 

and thus, upon relaxation it solves the 

y(t) = (Id + Rr X Fz(i) - L(z(t); F, R) (5) 



input-output mapping 1 1, 9]. Another natural choice is a 
network with feedforward connections W that executes 
mapping 



y(i)=Wz(t)=L(z(i);W). 



(6) 



3.2 Cost Function of K-ISA 



The neural network estimates hidden sources s m by non- 
linear (f) decorrelation of y m s, components of network 
output y. Formally: 

Let us denote the empirical f-covariance matrix of 
y(t) and y m (t) for function f = [f 1 ; . . . ; f M ] over [1, T] 
by 

S K (f, T) = cbv (f [^(y)] , f [MY)]) , (7) 
E# (f , T) = cow (P [y* (y 4 )] , f j [m (y 3 )}) , (8) 

, M, v?m(v) = v, (pc is 



respectively, where i, j = 1, 
the mapping 

(fiC ■ C L 3 v i— > v g 



»(0 
9(0 



»2L 



(9) 



Here, -Ji(-) [S(-)] denotes the real (imaginary) part, ® is 
the Kronecker-product. Then minimization of the follow- 
ing non-negative cost function (in 9) 



1 



Q e (f,T) :=--log 



det[S K (f,T)] 



nftidet[S™- m (f,T)]^ 



(10) 



gives rise to pairwise 1 f-uncorrelatedness: 

Theorem 1. For the separation carried out by the net- 
work minimizing cost function JlOt . the following state- 
ments are equivalent: 

i) f-uncorrelatedness: S^- 7 (f , T) = (V« =/= j). 

ii) Qq is minimal: Qq({, T) = 0. 

Proof (sketch). The statement follows from the inequality 
related to the multi-dimensional Shannon differential en- 



tropy H: Let u = [u 1 ; 
a random variable. Then 



u M ] G 



A I 



H(u\...,u M )<J2H(u m ), 



) denote 



(11) 



and equality holds iff u m s are independent. Hint: one 
can choose u as a normal random variable with covari- 
ance X]K(f, T) and insert the expression of the entropy of 
normal variables. 



'We note that - unlike in the the 1 -dimensional case, i.e., 
unlike for d = 1 - pairwise independence is not equivalent to 
mutual independence. Nonetheless, according to our numerical 
experiences it is an efficient approximation. 



Note 1. For the special case K = K, 6 = (F, R), 
f(z) = z and d = 1, see [9]. 

Note 2. Cost function Qq of dlOt is attractive from the 
point of view of computing its gradient. This gradient for 
the case of an RNN architecture [see Eq. (|5}/ may give 
rise to self-organization J^/. 

Note 3. For real random variables, the separation, which 
is aimed by cost function (I10> . can be related to the more 
general principle, the Kernel Generalized Variance ( KGV) 
technique ^3/. This technique aims to separate the y m 
components ofy, the transformed form of input z. To this 
end, KGV estimates mutual information I(y 1 , ■ ■ ■ ,y M ) 
in Gaussian approximation 2 by means of the covariance 
matrix of variable y. Here, the transformation of the KGV 
technique is realized by the neural network parameterized 
with variable and by the function f . 

Note 4. We note that KGV is related to the kernel covari- 
ance (KC) method [7], which makes use of the supremum 
of 1 -dimensional covariances as a measure of indepen- 
dence. Our approximation may also be improved by min- 
imizing <5e(f , T) on J(B f), i.e., on a set of functions. 

3.3 The K-ISA Algorithm 

Below, our proposed K-ISA method is introduced. A 
decomposition principle called K-ISA Separation Theo- 
rem has been formulated in 1 12]. It says that (under cer- 
tain conditions) the K-ISA task can be solved in 2 steps: 
In the first step, 1 -dimensional K-ICA estimation is ex- 
ecuted that provides separation matrix Wk-ica and esti- 
mated sources sr-ica- In the second step, optimal permu- 
tation of the K-ICA elements (§k ica) is searched for, the 
K-ICA elements are grouped. 

This principle is adapted to linear feedforward neu- 
ral networks [see, Eq. (|6}] here. 3 Separation matrix 
W = Wk-isa is searched in the form 



W 



K-ISA 



PW, 



-ICA, 



(12) 



where matrix P G M. DxD denotes the desired permuta- 
tion matrix. We search for the hidden sources s m by pair- 
wise decorrelation of the components y m of the output of 
the network using function manifold 2r (J: see, Note©. 
Thus, given Theorem^ our cost function is: 



Q(J,T,P) :=]T||Mo£ K (f,T,P) 



mm . 
p 



(13) 



Here: (i) £F denotes a set of functions, each func- 
tion R D h-> R D (if K = C then R 2D h-> 
M 2£) ), and each function acts on each coordinate sepa- 
rately, (ii) o denotes pointwise multiplication (Hadamard 
product), (iii) M masks according to the subspaces 
f M = Ed — Ijvr <8> where all elements of matrix 



Ed G 



»DxD 



and Ed € 



pdxd 



are equal to 1 [if ' 



"A complex variable is normal if its image using map ipc is 



Thus, relation J(y , 



real multivariate normal 

/[^(y 1 ), . . . , ipc(y M )] y m e C a extends the KGV based 
interpretation to the complex case, too [see Eqs. Q-{8)]. 

3 For the sake of simplicity we assume that all components 
have the same dimension, i.e., d — d m (Vm). 



Table 1 : IK-IS A Algorithm - pseudocode 



Input of the algorithm 

observation: {z{ty\t=i,...,T 
Optimization" 

K-ICA : on whitened observation z, 

=>■ sk-ica estimation 
Permutation search 

P :=Id 
repeat 

sequentially for Vp G S mi , q G S™ 2 
(mi ^ m 2 ) : 
ifg(J,T,P p9 P)<Q(J,T,P) 



end 



P P 



until Q(3 r , T, •) decreases in the sweep above 
Estimation 

§K-ISA = P§K-ICA 



"LetS 1 , 



, S M denote the indices of the I s 



subspaces, i.e., 9 m := {("^ — l)d + 1, . . - , md}, and 
permutation matrix P P9 exchanges coordinates p and g. 



A-Z 



OC ° CO 



Figure 1: Database Auj. 100-dimensional task of 50 
pieces of 2-dimensional components (D = 100, M = 50, 
d = 2). Hidden sources are uniformly distributed vari- 
ables on the letters of the English and the Greek alphabets. 




(a) 



(b) 



Figure 2: Database d-spherical. Stochastic representation 
of the 3 (M = 3) hidden sources, (a): p is uniform on 
[0, 1], (b): p is exponential with parameter p, = 1, and 
(c): p is lognormal with parameters p, = 0, <r = 1, respec- 
tively. 



then E D (E^ is replaced by E 2D (E 2d )]}, (iv) ||.||- de- 
notes the square of the Frobenius norm (sum of squares of 
the elements), (v) in Sjf(f , T, P), y = Psk-ica, and (vi) 
P is the D x D permutation matrix to be determined. 

Greedy permutation search is applied: 2 coordinates 
of different subspace are exchanged if this change lowers 
cost function Q(3 r , T, •). Note: Greedy search could be re- 
placed by a global one for a higher computational burden 
lllll . Table^contains the pseudocode of our technique. 

4 Illustrations 

The K-ISA identification algorithm of Section l3~3l is il- 
lustrated below (due to the lack of space, illustrations are 
provided for K = R only). Test cases are introduced in 
Section RTTl The quality of the solutions will be measured 
by the normalized Amari-distance (Section ^. 2> . Numeri- 
cal results are provided in Section l4~3l 



random variable independent of u^ d ' ( = r denotes equal- 
ity in distribution). This d-spherical database: (i) can 
be scaled in dimension d, (ii) satisfies conditions of the 
K-ISA Separation Theorem, and (iii) can be defined by p. 
(See olal for spherical variables.) Our choices for p are 
shown in Fig. 13 

4.2 Normalized Amari-distance 

The precision of our algorithm was measured by the 
normalized Amari-distance as follows. The opti- 
mal estimation of the K-ISA model provides matrix 
B := WA e K £>x - D , a block-permutation matrix made of 
d x d sized blocks. Let us decompose matrix B G M. DxD 
into d x d blocks: B = [B^'l . . , ... Let tf'i denote 

L J 2 J — 1,...,M 

the sum of the absolute values of the elements of matrix 
B lJ G K dxd . Then the normalized version of the Amari- 
distance 1 2] (as it was introduced in fTlll for K-ISA) is 
defined as: 



4.1 Databases 

Two databases were defined to study our algorithm. The 
databases are illustrated in Fig.n an d Fig- El 

4.1.1 The Auj Database 

Here, hidden sources s m are uniform distributions defined 
by the 2-dimensional images (d — 2) of letters on A-Z 
and a — u>. This is called database Au>, which has 50 
components (M = 50), see Fig^for an illustration. This 
test falls outside of the (known) validity domain of the 
K-ISA Separation Theorem. 

4.7.2 The d-Spherical Database 

Here, hidden sources s m are spherically symmetric 
random variables that have representation of the form 

v d =' pu.( d \ where is uniformly distributed on the 
d-dimensional unit sphere, and p is a non-negative scalar 



HB) 



2M(M - 1) 



M 

E 



max, W 



- 1 



(14) 



For matrix B we have that < r(B) < 1, and r(B) = 
if, and only if B is a block-permutation matrix with d x d 
sized blocks. Thus, r = corresponds to perfect esti- 
mation (0% error), r = 1 is the worst estimation (100% 
error). This performance measure can be used for K = C, 
too. 

4.3 Simulations 

Results on databases Auj and d-spherical are provided 
here. In our simulations, sample number of observations 
z{t) changed: 1000 < T < 30000. Mixing matrix A 



Aco 




10 15 20 
Number of samples (T) 

(a) 



30 
x10 3 



STMWOO 

(b) 



Figure 3: Estimations on database Auj. (a) Amari-error as 
a function of the number of samples. Average ideviation 
for 30000 samples: 0.58% ± 0.04, (b) estimation with av- 
erage error for 30000 samples: the hidden components 
are recovered up to permutation and orthogonal transfor- 
mation (R-ISA ambiguity). 

d-spherical 




5 10 15 20 30 

Number of samples (T) x -| rj 3 

Figure 4: Estimations of database d-spherical: Amari- 
error as a function of the number of samples on loglog 
scale for different dimensional (d) subspaces. Task dimen- 
sion: D. Errors are approximately linear, so they scale 
according to power law, like r{T) oc T~ c (c > 0). For 
numerical values, see Table|2| 



was chosen randomly from the orthogonal group. Mani- 
fold GFwas J := {z h cos(z),z i— > cos(2z)} (functions 
operated on coordinates separately). Scaling properties of 
the approximation were studied for database d-spherical 
by changing the value of d between 20 and 110 [i.e., the 
number of subspaces (M) was fixed, but the dimension 
of the subspaces was increased.] For each parameters [T 
for database Auj, (T, d) for database ri-spherical] ten ex- 
periments were averaged. Qualities of the solutions were 
measured by the Amari-error (see Section l4~2l . We have 
chosen FastICA H for the R-ICA module (see TableQ). 

Precision of our method is shown: (i) for database Aw 
in Fig.[3]as a function of sample number, (ii) for database 
ri-spherical in Fig.0]as a function of sample number and 
source dimension (d) (for details, see Table|4}. The figures 
demonstrate that the algorithm was able to uncover the 
hidden components with high precision. In the case of 
database ri-spherical the Amari error decreases according 
to power law r(T) oc T~ c (c > 0). 

In our numerical simulations, the number of sweeps 
before the iteration of the permutation optimization 
stopped (see TableQ varied between 2 and 6. 



Table 2: Amari-error for database ri-spherical, for differ- 
ent d values: average ± deviation. Number of samples: 
T = 30000. 



d= 20 


d= 30 


d = 40 


1.40% (±0.03) 


1.71% (±0.03) 


1.99% (±0.03) 


d = 50 


d= 60 


d = 70 


2.23% (±0.03) 


2.44% (±0.03) 


2.65% (±0.03) 


d = 80 


ri= 90 


ri= 100 


2.85% (±0.03) 


3.03% (±0.04) 


3.19% (±0.02) 


d = 110 




3.37% (±0.03) 
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